-
Notifications
You must be signed in to change notification settings - Fork 25.5k
[GPU] Optimize merge memory usage #136411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
libs/simdvec/src/main/java/org/elasticsearch/simdvec/QuantizedByteVectorValuesAccess.java
Show resolved
Hide resolved
"-Dio.netty.noUnsafe=true", | ||
"-Dio.netty.noKeySetOptimization=true", | ||
"-Dio.netty.recycler.maxCapacityPerThread=0", | ||
// temporary until we get access to raw vectors in a future Lucene version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an open Lucene issue or PR for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet; depending on how #136416 goes, and the opinion of people more expert in Lucene (you, Chris, Ben), I'd like to generalize what we did there and raise a Lucene issue to have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I honestly don't see why Lucene would ever expose this information. It expands an API for no good purpose within Lucene.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not this API as-is, but I'd think there is value in having the ability to access back in a convenient and efficient way what has been written so far; it avoids having to write the same data more than once, or keep copies on memory, when we need the original data (e.g. raw vectors) to add "something" on top of it (e.g. quantized vectors, graph, etc.).
(But maybe I'm naive)
import java.lang.invoke.MethodHandles; | ||
import java.lang.invoke.VarHandle; | ||
|
||
public class VectorsFormatReflectionUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, very nice organization! +1 for using VarHandle for reflection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ldematte Great work, I have not tested it yet, but amazing work how you organized it. My main comment: do you think we can simplify this PR by breaking into two separate ones: making this PR only about changes to merges, and doing changes for flush, ResourcesHolder, 128Mb in a separate PR? Or these changes are tightly coupled?
...rc/main/java/org/elasticsearch/index/codec/vectors/reflect/VectorsFormatReflectionUtils.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/index/codec/vectors/reflect/VectorsFormatReflectionUtils.java
Show resolved
Hide resolved
x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/ES92GpuHnswVectorsWriter.java
Outdated
Show resolved
Hide resolved
I can do that: here is the PR #136464 |
x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/ES92GpuHnswVectorsWriter.java
Show resolved
Hide resolved
@ldematte Great changes. I have done some benchmarking on my laptop with int8, and I see great recall but surprisingly no speedups as compared with main branch: gist: 1_000_000 docs; 960 dims; euclidean metric
cohere-wikipedia_v2: 934_024 docs; 768 dims; cosine metric
|
// problems with strides; the explicit copy removes the stride while copying. | ||
// Note that this is _not_ an additional copy: input data needs to be moved to GPU memory anyway, | ||
// we are just doing it explicitly instead of relying on CagraIndex#build to do it. | ||
var deviceDataSet = dataset.toDevice(resourcesHolder.resources()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice workaround, so you also confirmed that strides don't work properly with Cagra index implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, @ldematte
This PR changes how we gather and compact vector data for transmitting them to the GPU. Instead of using a temporary file to write out the compacted arrays, we use directly the vector values from the scorer supplier, which are backed by a memory mapped input. This way we avoid an additional copy of the data.